Assessing and Visualizing Simultaneous Simulation Error

PML Journal Club: Teemu Säilynoja

2023-03-28

Assessing and Visualizing Simultaneous Simulation Error

Nathan Robertson1, James M. Flegal1, Dootika Vats2, Galin L. Jones3

Journal of Computational and Graphical Statistics 2021, Vol. 30, No. 2, 324–334

Motivation

Simultaneous estimation of means and quantiles has received little attention, despite being common practice.Robertson et al. (2021)

  • Is zero included in the 90% predictive interval?
  • Reproducibility of experiment?

SCI: Summarising predictive distribution

Problem

Let \(\pi\) be a probability density with support \(\mathcal X \in \mathbb R^d\) and \(X\sim \pi\).

Denote with \(\mathbf{m}:\mathcal X \to \mathbb R^{p_1}\) and \(\mathbf{q} : \mathcal X \to \mathbb R ^{p_2}\) the means and quantiles of interest.

Above,

\[ m_i = \mathbb E_\pi\left(g_i(X)\right) = \int_{\mathcal X}g_i(x)\pi(dx),\] for some \(g:\mathcal X \to \mathbb R^{p_1}\).

And

\[ q_i = F_{h_i}^{-1}(p_{q_i}) = \inf\left\lbrace v : F_{h_i}(v)\geq p_{q_i}\right\rbrace,\] with \(h:\mathcal X \to \mathbb R^{p_2}\), where \(V = h_i(X)\) is distributed according to \(F_{h_i}(v)\), which is absolutely continuous and has continuous density function \(f_{h_i}(v)\).

Problem

Let \(\pi\) be a probability density with support \(\mathcal X \in \mathbb R^d\) and \(X\sim \pi\).

Denote with \(\mathbf{m}:\mathcal X \to \mathbb R^{p_1}\) and \(\mathbf{q} : \mathcal X \to \mathbb R ^{p_2}\) the means and quantiles of interest.

  1. We want better than marginal MCSEs, when estimating the uncertainty of \(\mathbf m\) and \(\mathbf q\).
  1. Let \(\nu = \begin{bmatrix}\mathbf m\\\mathbf q\end{bmatrix}\), and \(\hat\nu\) a simulation based estimate of \(\nu\).

Even if we find \(\Xi\in \mathbb R ^{p \times p}\) s.t.

\[(\hat\nu_n - \nu) \to \mathcal N(0, \Xi), \quad \text{as $n \to \infty$,}\]

visualizing the elliptical confidence regions is difficult.

Contributions

  1. Multivariate central limit theorem for any finite combination of sample means and quantiles under the assumption of a strongly mixing process. 1

  2. Fast algorithm for constructing hyperrectangular confidence regions. 2

Multivariate CLT

Let \(\left\lbrace X_t \right\rbrace\) be strictly stationary and strongly mixing.

Let \(A_h\) be a \(p_2 \times p_2\) diagonal matrix with \(A_{h}[i,i] = f_{h_i}(q_i)\) and define \[ \Lambda = \begin{bmatrix} I_{p_1 \times p_1} & 0_{p_1 \times p_2}\\ 0_{p_2 \times p_1} & A_h. \end{bmatrix} \]

If \(Y_j = \begin{bmatrix} g(X_j), I(h(X_j) > \mathbf q)\end{bmatrix}^T\), and

\[ \Sigma = \text{cov}\left(Y_1, Y_1\right) + \sum_{j=2}^\infty\text{cov}\left(Y_1, Y_j\right) + \text{cov}\left(Y_1, Y_j\right)^T\] is positive definite, then

\[\sqrt n (\hat\nu_n - \nu) \to \mathcal N(0, \Lambda^{-1}\Sigma\Lambda^{-1}).\]

Simultaneous Confidence Intervals

  • Lower bound: coverage at most \(1-\alpha\)

  • Upper bound: coverage at least \(1-\alpha\)

Simultaneous Confidence Intervals

Upper and lower \(p\)-dimensional confidence intervals for \(\nu = \begin{bmatrix}\mathbf m \\ \mathbf q \end{bmatrix} \in \mathbb{R}^{p_1 + p_2}\).

Let \(\hat\Lambda^{-1}\hat\Sigma\hat\Lambda^{-1}\) be a strongly consistent estimator of \(\Lambda^{-1}\Sigma\Lambda^{-1}\).

\[\begin{align} C_{SI}(z) :=& \prod_{i=1}^{p_1}\left[\bar m_{i} - z\frac{\hat\sigma_i}{n}, \bar m_{i} + z\frac{\hat\sigma_i}{n}\right]\prod_{j=1}^{p_2}\left[\hat q_{j} - z\frac{\hat\gamma_{j}}{n}, \bar q _{j} + z\frac{\hat\gamma_{j}}{n}\right] \end{align},\]

where \(\hat\gamma_j\) is the \(j\)th diagonal element of \(\hat A_h^{-1}\hat\Sigma_h\hat A_h^{-1}\).

\(C_{LB} = C_{SI}\left(\Phi^{-1}\left(1-\frac\alpha 2\right)\right)\) has coverage of at most \(1-\alpha\).1

\(C_{UB} = C_{SI}\left(\Phi^{-1}\left(1-\frac\alpha {2p}\right)\right)\) has coverage of at least \(1-\alpha\).2

Simultaneous Confidence Intervals

Find \(C_{\alpha} = C_{SI}(z_\alpha)\) with coverage \(1 - \alpha\) s.t. \(C_{LB}\subseteq C_{\alpha} \subseteq C_{UB}\)

  • 1-D optimization task w.r.t. \(z\in \left[\Phi^{-1}\left(1-\frac\alpha 2\right), \Phi^{-1}\left(1-\frac\alpha {2p}\right)\right]\)

  • Use quasi-monte carlo methods to evaluate the coverage level w.r.t. the multivariate normal distribution.

SCI: Method comparison

SCI: Posterior intervals

Conclusion

  • Pay attention to the simultaneous uncertainty of all the quantities estimated.
    • At least to justify the use of traditional visualizations with low uncertainty.
  • Robertson et al. (2021) provide a framework for a fast computational algorithm to estimate the simultaneous confidence intervals.

References

Robertson, Nathan, James M. Flegal, Dootika Vats, and Galin L. Jones. 2021. “Assessing and Visualizing Simultaneous Simulation Error.” Journal of Computational and Graphical Statistics 30 (2): 324–34. https://doi.org/10.1080/10618600.2020.1824871.